AITopics | episodic reinforcement learning

Collaborating Authors

episodic reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constrainedepisodicreinforcementlearningin concave-convexandknapsacksettings

Neural Information Processing SystemsFeb-10-2026, 02:57:31 GMT

Our approach relies on the principle ofoptimism under uncertaintyto efficiently explore. Our learning algorithms optimizetheiractions withrespect toamodel based ontheempirical statistics, while optimistically overestimating rewards and underestimating the resource consumption (i.e., overestimating the distance from the constraint).

constraint, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 05:19:05 GMT

Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings. However, existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees. We introduce novel proof techniques to show that under suitable conditions, the worst-case regret of our posterior sampling method matches the best known results of optimization based methods. In the linear MDP setting with dimension, the regret of our algorithm scales linearly with the dimension as compared to a quadratic dependence of the existing posterior sampling-based exploration algorithms.

efficient model-free posterior sampling method, episodic reinforcement learning, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

Add feedback

Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 03:28:52 GMT

Reinforcement Learning (RL) tasks generally divide into two kinds: continual learning and episodic learning. The concept of steady state has played a foundational role in the continual setting, where unique steady-state distribution is typically presumed to exist in the task being studied, which enables principled conceptual framework as well as efficient data collection method for continual RL algorithms. On the other hand, the concept of steady state has been widely considered irrelevant for episodic RL tasks, in which the decision process terminates in finite time. Alternative concepts, such as episode-wise visitation frequency, are used in episodic RL algorithms, which are not only inconsistent with their counterparts in continual RL, and also make it harder to design and analyze RL algorithms in the episodic setting. In this paper we proved that unique steady-state distributions pervasively exist in the learning environment of episodic learning tasks, and that the marginal distributions of the system state indeed approach to the steady state in essentially all episodic tasks.

reinforcement learning, steady state, steady state analysis, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Neural Information Processing SystemsDec-23-2025, 16:39:06 GMT

We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.

episodic reinforcement learning, improved instance-dependent regret bound, value-function gap, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Neural Information Processing SystemsNov-21-2025, 14:26:20 GMT

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

episodic reinforcement learning, uniform pac bound, unifying pac and regret, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Review for NeurIPS paper: Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing SystemsJan-25-2025, 08:22:18 GMT

Clarity: In my opinion the main weakness of the paper is its presentation. First, there is a lack of clear, direct, explanations of what the paper is trying to accomplish. Several crucial points are either only implied or mentioned in passing without the proper emphasis. This is true for the positioning of the paper itself. The analysis seems to be mostly concerned with policy gradient methods, but this is never explicitly stated.

episodic reinforcement learning, neurips paper, steady state analysis, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Steady State Analysis of Episodic Reinforcement Learning

Neural Information Processing SystemsJan-25-2025, 08:22:11 GMT

This paper provides a new perspective in thinking about episodic RL, and should be of interest to anyone working with MDPs in reinforcement learning. Three reviewers (R1, R2, R3) commented that it was well-written and clear, although R4 disagreed. All reviewers commented on the interesting contributions (proving that MDPs within episodic RL can be proven to be ergodic). R1, R2, and R3 had concerns that it was a mostly theoretical paper, and wondered how to practically apply these insights. However, the rebuttal goes some way to address these points, and R4 was convinced to raise their recommendation to weak accept.

episodic reinforcement learning, neurips paper, steady state analysis, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Review for NeurIPS paper: A Unifying View of Optimism in Episodic Reinforcement Learning

Neural Information Processing SystemsJan-21-2025, 14:34:51 GMT

Weaknesses: While I like the duality result, I find this paper is not substantial enough that merits acceptance. This paper shows a class of model-optimistic algorithms can be implemented efficiently (with minor modifications). However, none of state-of-the-art algorithms is model-optimistic algorithms. This is somehow inherent with this class of algorithms because the transition model scales with S 2 but the optimal bounds scale linearly in S via value-optimistic algorithms. Value-optimisic algorithms are not only more computationally efficient but also more statistically efficient. So making model-optimistic algorithms more efficient is not a very significant result.

algorithm, episodic reinforcement learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Review for NeurIPS paper: A Unifying View of Optimism in Episodic Reinforcement Learning

Neural Information Processing SystemsJan-21-2025, 14:34:45 GMT

The reviewers are in agreement that this is interesting and well-presented work. The main concern was about the extent to which the results will help us derive SOTA algorithms in the future. I find the contribution reasonable without this and hope the community will figure out how/if these results are useful. Please do take the reviewers minor suggestions into consideration when preparing a final version.

episodic reinforcement learning, neurips paper, unifying view, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Filters

Collaborating Authors

episodic reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Constrainedepisodicreinforcementlearningin concave-convexandknapsacksettings

A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

Steady State Analysis of Episodic Reinforcement Learning

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

bc6d753857fe3dd4275dff707dedf329-Paper.pdf

Review for NeurIPS paper: Steady State Analysis of Episodic Reinforcement Learning

Review for NeurIPS paper: Steady State Analysis of Episodic Reinforcement Learning

Review for NeurIPS paper: A Unifying View of Optimism in Episodic Reinforcement Learning

Review for NeurIPS paper: A Unifying View of Optimism in Episodic Reinforcement Learning